Genome-wide association studies of polygenic risk score-derived phenotypes may lead to inflated false positive rates

In a recent study, a polygenic risk score (PRS) for Alzheimer’s disease was used to construct a new phenotype for a subsequent genome-wide association study (GWAS). Here we show that the applied method, in which the same genetic variants are used to construct the PRS-derived phenotype as well as to assess their effect in a GWAS of the same phenotype, leads to inflated false positive rates. We illustrate this bias by simulation. We first simulate an initial discovery cohort, and run a GWAS of a disorder like Alzheimer’s disease. We then simulate a target cohort, in which we construct a PRS based on the initial GWAS results. Following the published study, we select the bottom and top 5% of individuals in the PRS distribution and define them as controls and cases. Lastly, we run a GWAS on the new PRS-derived phenotype using all genetic variants. We show that at a significance threshold of 5 × 10−8, false positive rates are inflated up to 0.004 (an 80,000-fold increase compared to 5 × 10−8). We also show that such inflation can be prevented by excluding all variants that were used to construct the PRS (as well as all variants in linkage disequilibrium), when a GWAS on a PRS-derived phenotype is conducted.

www.nature.com/scientificreports/ number of false positive associations increased to 659 (s.e.m. = 2.6). Interestingly, overlap between the discovery and target cohort inflated the false positive rate for null-SNPs used to construct the PRS-derived phenotype but deflated it for all other null-SNPs (see Supplementary Fig. 1). The reason for this is that p-values for null-SNPs will be correlated between the GWAS for AD and the GWAS for the PRS-derived phenotype when there is sample overlap (because AD and the PRS-derived phenotype are correlated and the same individuals are used). Selecting SNPs with p-values smaller than 0.05 for the PRS similarly selects SNPs not part of the PRS with p-values larger than 0.05. As a consequence, the GWAS of the PRS-derived phenotype will have deflated test statistics at null-SNPs not included in the PRS. Next, we varied the p-value threshold for inclusion in the PRS (i.e. varying the threshold from 0.05 to 1 and 5 × 10 -8 , thus including either all SNPs or only genome-wide significant SNPs, respectively). We found that using all SNPs in constructing the PRS-derived phenotype reduced the inflation of false positive rates (as well as the number of false positives, see Supplementary Fig. 2). This reduction is observed because the bias is diluted across all null-SNPs and so the mean false positive rate decreases. Reducing the p-value threshold to 5 × 10 -8 resulted in false positive rates that are not inflated. This is because almost no null-SNP had such a low p-value for AD, and thus almost no null-SNPs were used to construct the PRS-derived phenotype.
Lastly, we evaluated a potential power gain for causal SNPs that were not included in the PRS. We calculated the difference in test statistics between the two GWAS (i.e. Z PRS-derived phenotype -Z AD ) and found a strong power decrease (mean difference = − 0.14, p < 2.2 * 10 -16 ) in the GWAS of the PRS-derived phenotype. This can be explained by the reduction in sample size and only a partial phenotypic correlation between AD and the PRSderived phenotype. Thus, an increase in power can only be observed for causal SNPs included in the PRS. But because it is not known which SNPs are causal, true associations cannot be distinguished reliably from false positives.
To summarize, Gouveia and colleagues (2022) 1 used a new study design with the aim to improve the power for a GWAS of Alzheimer's disease. Based on simulations, we showed that this approach may lead to inflated false positive rates of 80,000-fold increases at a genome-wide significance threshold of 5 × 10 -8 . The reason for this is that the same SNPs used to construct the PRS-derived phenotype were subsequently tested for association with this newly constructed phenotype. We found the false positive rate inflation was more pronounced in the case of sample overlap between the discovery and target cohort. Our results show that false positive rates are not inflated when the GWAS of the PRS-derived phenotype is performed on SNPs that were not also used to construct the PRS. However, we note that when there is linkage disequilibrium between SNPs included in the PRS and null-SNPs not included in the PRS this could still result in an inflated false positive rate. An appealing approach may be to use a leave-one-chromosome-out approach, where the PRS is constructed using 21 chromosomes, and the GWAS of the PRS-derived phenotype only uses the 22nd left-out chromosome (repeated 22 times so that all chromosomes are left out once). However, in our simulations we found a power decrease for causal SNPs that were not included in the PRS. Moreover, we note SNPs can also be correlated across chromosomes due to e.g. non-random mating 7 which could in theory also lead to inflated false positive rates for this approach, but we are not certain about the extent of this inflation which could well be negligible. See the Supplementary Note for a short discussion of some other approaches analyzing (partly) PRS-derived phenotypes, including an approach to improve power 8,9 .  www.nature.com/scientificreports/ To conclude, phenotype definitions based on PRSs require careful consideration in subsequent GWAS. While excluding any SNP (and those in linkage disequilibrium) from the GWAS that was used to construct the PRSderived phenotype prevents inflation of false positive rates, it also leads to a loss of power for causal SNPs.

Methods
Simulation. We simulated individual genotype and phenotype data based on the liability threshold model.
Our chosen parameters were loosely based on Alzheimer's disease 2,3,6 , with a population and sample prevalence of 5%, SNP-heritability (h 2 SNP ) of 10% on the liability scale, and a PRS that explains 5% of the variance (R 2 ) on the liability scale. We simulated a total of 170,000 SNPs in linkage equilibrium with a minimum minor allele frequency of 0.1%, as this was the number of pruned SNPs used by Gouveia et al. (2022) 1 . Out of these, 1200 SNPs were causal, as previously estimated for Alzheimer's disease 6 , and 168,800 were non-causal. We used the avengeme R package to calculate the number of individuals required for the discovery cohort to produce a PRS that explains the desired R 2 value on the liability scale 10 . We simulated individuals and their liabilities, such that individuals with liabilities larger than the liability-threshold are designated cases, and otherwise controls. We repeatedly simulated individuals until we reached the desired number of individuals (N = 366,771 discovery, N = 300,000 target). We repeated the simulation for three target cohorts. That is, within the same simulation run, one target cohort was fully independent of the discovery cohort (0% sample overlap), in the other 50% (and 100%) of individuals were also present in the discovery cohort. Next, we ran a GWAS in the discovery cohort using plink version 1.9 11 . Using the estimated betas, we calculated PRS in the target cohorts to determine the top and bottom 5% of the PRS distribution to define the PRS extremes (i.e. the PRS-derived phenotype), and thus removed 90% of the sample. Lastly, we ran a second GWAS of the PRS-derived phenotype (N = 30,000) and recorded the false positive rate and the variance of test statistics. We repeated the simulation 100 times. We performed several model checks to ensure our simulations have the desired characteristics; specifically, we verified that the false positive rate and test statistics are not inflated for the primary GWAS of Alzheimer's disease.